AITopics | nadaraya-watson regression

Collaborating Authors

nadaraya-watson regression

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards a Relationship-Aware Transformer for Tabular Data

Konstantinov, Andrei V., Zuev, Valerii A., Utkin, Lev V.

arXiv.org Artificial IntelligenceDec-9-2025

Deep learning models for tabular data typically do not allow for imposing a graph of external dependencies between samples, which can be useful for accounting for relatedness in tasks such as treatment effect estimation. Graph neural networks only consider adjacent nodes, making them difficult to apply to sparse graphs. This paper proposes several solutions based on a modified attention mechanism, which accounts for possible relationships between data points by adding a term to the attention matrix. Our models are compared with each other and the gradient boosting decision trees in a regression task on synthetic and real-world datasets, as well as in a treatment effect estimation task on the IHDP dataset.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2512.0731

Country: North America > United States (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.93)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel

Zheng, Chuanyang, Sun, Jiankai, Gao, Yihang, Xie, Enze, Wang, Yuehao, Wang, Peihao, Xu, Ting, Chang, Matthew, Ren, Liliang, Li, Jingyao, Xiong, Jing, Rasul, Kashif, Schwager, Mac, Schneider, Anderson, Wang, Zhangyang, Nevmyvaka, Yuriy

arXiv.org Artificial IntelligenceOct-15-2025

Mixture-of-Experts (MoE) has become a cornerstone in recent state-of-the-art large language models (LLMs). Traditionally, MoE relies on $\mathrm{Softmax}$ as the router score function to aggregate expert output, a designed choice that has persisted from the earliest MoE models to modern LLMs, and is now widely regarded as standard practice. However, the necessity of using $\mathrm{Softmax}$ to project router weights into a probability simplex remains an unchallenged assumption rather than a principled design choice. In this work, we first revisit the classical Nadaraya-Watson regression and observe that MoE shares the same mathematical formulation as Nadaraya-Watson regression. Furthermore, we show that both feed-forward neural network (FFN) and MoE can be interpreted as a special case of Nadaraya-Watson regression, where the kernel function corresponds to the input neurons of the output layer. Motivated by these insights, we propose the \textbf{zero-additional-cost} Kernel Inspired Router with Normalization (KERN), an FFN-style router function, as an alternative to $\mathrm{Softmax}$. We demonstrate that this router generalizes both $\mathrm{Sigmoid}$- and $\mathrm{Softmax}$-based routers. \textbf{Based on empirical observations and established practices in FFN implementation, we recommend the use of $\mathrm{ReLU}$ activation and $\ell_2$-normalization in $\mathrm{KERN}$ router function.} Comprehensive experiments in MoE and LLM validate the effectiveness of the proposed FFN-style router function \methodNorm.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.25913

Country:

Asia (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Note on Doubly Robust Estimator in Regression Continuity Designs

Kato, Masahiro

arXiv.org Machine LearningDec-2-2024

This note introduces a doubly robust (DR) estimator for regression discontinuity (RD) designs. RD designs provide a quasi-experimental framework for estimating treatment effects, where treatment assignment depends on whether a running variable surpasses a predefined cutoff. A common approach in RD estimation is the use of nonparametric regression methods, such as local linear regression. However, the validity of these methods still relies on the consistency of the nonparametric estimators. In this study, we propose the DR-RD estimator, which combines two distinct estimators for the conditional expected outcomes. The primary advantage of the DR-RD estimator lies in its ability to ensure the consistency of the treatment effect estimation as long as at least one of the two estimators is consistent. Consequently, our DR-RD estimator enhances robustness of treatment effect estimators in RD designs.

estimator, rd design, regression, (13 more...)

arXiv.org Machine Learning

2411.07978

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.71)

Add feedback

Flexible conditional density estimation for time series

Grivol, Gustavo, Izbicki, Rafael, Okuno, Alex A., Stern, Rafael B.

arXiv.org Artificial IntelligenceJan-23-2023

This paper introduces FlexCodeTS, a new conditional density estimator for time series. FlexCodeTS is a flexible nonparametric conditional density estimator, which can be based on an arbitrary regression method. It is shown that FlexCodeTS inherits the rate of convergence of the chosen regression method. Hence, FlexCodeTS can adapt its convergence by employing the regression method that best fits the structure of data. From an empirical perspective, FlexCodeTS is compared to NNKCDE and GARCH in both simulated and real data. FlexCodeTS is shown to generally obtain the best performance among the selected methods according to either the CDE loss or the pinball loss.

artificial intelligence, flexcodet, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2301.09671

Country:

North America > United States > New York (0.04)
South America > Brazil > São Paulo (0.04)
North America > United States > California (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Banking & Finance (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

BENK: The Beran Estimator with Neural Kernels for Estimating the Heterogeneous Treatment Effect

Kirpichenko, Stanislav R., Utkin, Lev V., Konstantinov, Andrei V.

arXiv.org Artificial IntelligenceNov-19-2022

A method for estimating the conditional average treatment effect under condition of censored time-to-event data called BENK (the Beran Estimator with Neural Kernels) is proposed. The main idea behind the method is to apply the Beran estimator for estimating the survival functions of controls and treatments. Instead of typical kernel functions in the Beran estimator, it is proposed to implement kernels in the form of neural networks of a specific form called the neural kernels. The conditional average treatment effect is estimated by using the survival functions as outcomes of the control and treatment neural networks which consists of a set of neural kernels with shared parameters. The neural kernels are more flexible and can accurately model a complex location structure of feature vectors. Various numerical simulation experiments illustrate BENK and compare it with the well-known T-learner, S-learner and X-learner for several types of the control and treatment outcome functions based on the Cox models, the random survival forest and the Nadaraya-Watson regression with Gaussian kernels. The code of proposed algorithms implementing BENK is available in https://github.com/Stasychbr/BENK.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2211.10793

Country:

Asia > Russia (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New Jersey (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.68)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

LARF: Two-level Attention-based Random Forests with a Mixture of Contamination Models

Konstantinov, Andrei V., Utkin, Lev V.

arXiv.org Artificial IntelligenceOct-11-2022

New models of the attention-based random forests called LARF (Leaf Attention-based Random Forest) are proposed. The first idea behind the models is to introduce a two-level attention, where one of the levels is the "leaf" attention and the attention mechanism is applied to every leaf of trees. The second level is the tree attention depending on the "leaf" attention. The second idea is to replace the softmax operation in the attention with the weighted sum of the softmax operations with different parameters. It is implemented by applying a mixture of the Huber's contamination models and can be regarded as an analog of the multi-head attention with "heads" defined by selecting a value of the softmax parameter. Attention parameters are simply trained by solving the quadratic optimization problem. To simplify the tuning process of the models, it is proposed to make the tuning contamination parameters to be training and to compute them by solving the quadratic optimization problem. Many numerical experiments with real datasets are performed for studying LARFs. The code of proposed algorithms can be found in https://github.com/andruekonst/leaf-attention-forest.

artificial intelligence, decision tree learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2210.05168

Country:

Asia > Russia (0.28)
North America > United States > New York (0.04)
Europe > Russia > Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.04)
Europe > France > Hauts-de-France > Oise > Compiègne (0.04)

Genre: Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Heterogeneous Treatment Effect with Trained Kernels of the Nadaraya-Watson Regression

Konstantinov, Andrei V., Kirpichenko, Stanislav R., Utkin, Lev V.

arXiv.org Artificial IntelligenceJul-19-2022

The efficient treatment for a patient with her/his clinical and other characteristics [1, 2] can be regarded as an important goal of the real personalized medicine. The goal can be achieved by means of the machine learning methods due to the increasing amount of available electronic health records which are a basis for developing accurate models. To estimate the treatment effect, patients are divided into two groups called treatment and control, and then patients from the different groups are compared. One of the popular measures of the efficient treatment used in machine learning models is the average treatment effect (ATE) [3], which is estimated on the basis of observed data about patients as the mean difference between outcomes of patients from the treatment and control groups. Due to the difference between the patients characteristics and the difference between their responses to a particular treatment, the treatment effect is measured by the conditional average treatment effects (CATE) or the heterogeneous treatment effect (HTE) defined as ATE conditional on a patient feature vector [4, 5, 6, 7]. Two main problems can be pointed out when CATE is estimated. The first one is that the control group is usually larger than the treatment group. As a result, we meet the problem of a small training dataset, which does not allow us to apply directly many efficient machine learning methods.

nadaraya-watson regression, neural network, treatment effect, (14 more...)

arXiv.org Artificial Intelligence

2207.09139

Country:

North America > United States (0.14)
Asia > Russia (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Health Care Technology > Medical Record (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback